Collocation Translation Acquisition Using Monolingual Corpora

نویسندگان

  • Yajuan Lü
  • Ming Zhou
چکیده

Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency triple translation model is estimated using the EM algorithm based on a dependency correspondence assumption. The generated triple translation model is used to extract collocation translations from two monolingual corpora. Experiments show that our approach outperforms the existing monolingual corpus based methods in dependency triple translation and achieves promising results in collocation translation extraction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synonymous Collocation Extraction Using Translation Information

Automatically acquiring synonymous collocation pairs such as and from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. Methods that use monolingual corpora alone or use bilingual corpora alone are apparently inadequate because of low precision or low coverage. I...

متن کامل

Improving Translation Selection with a New Translation Model Trained by Independent Monolingual Corpora

We propose a novel statistical translation model to improve translation selection of collocation. In the statistical approach that has been popularly applied for translation selection, bilingual corpora are used to train the translation model. However, there exists a formidable bottleneck in acquiring large-scale bilingual corpora, in particular for language pairs involving Chinese. In this pap...

متن کامل

Improving Statistical Machine Translation with Monolingual Collocation

This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of...

متن کامل

Collocation Extraction Using Monolingual Word Alignment Method

Statistical bilingual word alignment has been well studied in the context of machine translation. This paper adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations from monolingual corpus. The monolingual corpus is first replicated to generate a parallel corpus, where each sentence pair consists of two identical sentences in the same language. Then the mon...

متن کامل

Identifying collocations using cross-lingual association measures

We introduce a simple and effective crosslingual approach to identifying collocations. This approach is based on the observation that true collocations, which cannot be translated word for word, will exhibit very different association scores before and after literal translation. Our experiments in Japanese demonstrate that our cross-lingual association measure can successfully exploit the combi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004